Biostatistics For Dummies (Monika Wahi John Pezzullo)

a list of everyone in the city we could contact, it would be not be feasible to visit all of them and

measure their SBP. Nor would it be necessary. Using inferential statistics, we could draw a sample

from this population, measure their SBPs, and calculate the mean as a sample statistic. Using this

approach, we could estimate the mean SBP of the population.

But drawing a sample that is representative of the background population depends on probability (as

well as other factors). In the following sections, we explain why samples are valid but imperfect

reflections of the population from which they’re drawn. We also describe the basics of probability

distributions. For a more extensive discussion of sampling, see Chapter 6.

Recognizing that sampling isn’t perfect

As used in epidemiologic research, the terms population and sample can be defined this way:

Population: All individuals in a defined target population. For example, this may be all

individuals in the United States living with a diagnosis of Type II diabetes.

Sample: A subset of the target population actually selected to participate in a study. For example,

this could be patients in the United States living with Type II diabetes who visit a particular clinic

and meet other qualification criteria for the study.

Any sample, no matter how carefully it is selected, is only an imperfect reflection of the population.

This is due to the unavoidable occurrence of random sampling fluctuations called sampling error.

To illustrate sampling error, we obtained a data set containing the number of private and public

airports in each of the United States and the District of Columbia in 2011 from Statista (available at

https://www.statista.com/statistics/185902/us-civil-and-joint-use-airports-

2008/). We started by making a histogram of the entire data set, which would be considered a census

because it contains the entire population of states. A histogram is a visualization to determine the

distribution of numerical data, and is described more extensively in Chapter 9. Here, we briefly

summarize how to read a histogram:

A histogram looks like a bar chart. It is specifically crafted to display a distribution.

The histogram’s y-axis represents the number (or frequency) of individuals in the data that fall in

the numerical ranges (known as classes) of the value being charted, which are listed across the x-

axis. In this case, the y-axis would represent number of states falling in each class.

This histogram’s x-axis represents classes, or numerical ranges of the value being charted, which

is in this case is number of airports.

We first made a histogram of the census, then we took four random samples of 20 states and made a

histogram of each of the samples. Figure 3-1 shows the results.